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ABSTRACT 

Human computation is a computing approach that draws upon human cognitive abilities to solve computational tasks for 
which there are so far no satisfactory fully automated solutions even when using the most advanced computing technolo¬ 
gies available. Human computation for citizen science projects consists in designing systems that allow large crowds of 
volunteers to contribute to scientific research by executing human computation tasks. Examples of successful projects are 
Galaxy Zoo and Foldit. A key feature of this kind of project is its capacity to engage volunteers. An important requirement 
for the proposal and evaluation of new engagement strategies is having a clear understanding of the typical engagement of 
the volunteers; however, even though several projects of this kind have already been completed, little is known about this 
issue. In this paper, we investigate the engagement pattern of the volunteers in their interactions in human computation for 
citizen science projects, how they differ among themselves in terms of engagement, and how those volunteer engagement 
features should be taken into account for establishing the engagement encouragement strategies that should be brought into 
play in a given project. To this end, we define four quanfifafive engagemenf mefrics fo measure differenf aspecfs of volunfeer 
engagemenf, and use dafa mining algorifhms fo identify fhe differenf volunfeer profiles in ferms of fhe engagemenf mefrics. 
Our sfudy is based on dafa collecfed from fwo projecfs: Galaxy Zoo and The Milky Way Projecf. The resulfs show fhaf 
fhe volunfeers in such projecfs can be grouped info five disfincf engagemenf profiles fhaf we label as follows: hardworking, 
spasmodic, persisfenf, lasfing, and moderafe. The analysis of fhese profiles provides a deeper undersfanding of fhe nafure of 
volunfeers’ engagemenf in human compufafion for cifizen science projecfs. 
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1. INTRODUCTION 

Human compufafion is a compufing approach based on harnessing human cognifive abilifies fo solve compufafional fasks for 
which fhere are so far no salisfacfory fully aufomafed solutions even when using fhe mosf advanced compufing fechnologies 
currenfly available (Quinn and Bederson, 2011). Examples of such fasks may be found in fhe areas of nafural language 
processing, image undersfanding, and creafivify. They have been shown fo be offen in scientific applicafions relafed fo disci¬ 
plines such as biology, linguistics, and asfronomy (Wiggins and Crowsfon, 2012; Linfolf and Reed, 2013). As a resulf, if has 
become common among scienfisls fo sfarf projecfs fo recruif ordinary people for execufing human compufafion fasks, which 
we call human computation for citizen science projects. Cifizen science can be broadly defined as a parfnership befween 
scienfisfs and ordinary people willing fo confribufe fo an aufhenfic scientific research efforf (Cohn, 2008; Dickinson ef ah, 
2012; Linfofl and Reed, 2013). A large range of activities can be carried ouf by ordinary people in citizen science (Good- 
child, 2007; Cohn, 2008; Wiggins and Crowsfon, 2012). Those activities may require only some simple abilities, such as 
dafa collecting and reporting, or more complex cognifive abilifies such as dafa aggregation and classificafion. In human 
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computation for citizen science projects, participants contribute by executing tasks that require cognitive abilities. Examples 
of projects with such feature are Galaxy Zoo (Lintott et al., 2008) and Foldit (Cooper et al, 2010). 

The contribution behaviour of people taking part in this type of project can be examined in the light of two different research 
approaches centered on the notions of voluntarism (Clary et al., 1998; Wilson, 2000) and human engagement (O’Brien and 
Toms, 2008; Simpson, 2009; Lehmann et al., 2012). Voluntarism literature usually distinguishes between two different types 
of contribution behaviour: helping activity behaviour and volunteerism behaviour (Clary et al., 1998; Wilson, 2000). Helping 
activity behaviour designates a form of sporadic participation in which the individual is faced with an unexpected request 
to help someone to do something. Volunteerism behaviour, on the other hand, concerns to a kind of planned behaviour. 
Volunteers are usually actively seeking out opportunities to help others. They typically commit themselves to an ongoing 
relationship at considerable personal cost in terms of dedicated time or cognitive effort. Drawing this distinction between 
helping activity and voluntarism seems to us to be important also in the context of human computation for citizen science 
projects. A recent characterization of the behaviour of volunteers in such projects brings to light the existence of two main 
groups of participants: transient and regular (Ponciano et al., 2014b). Transient participants exhibit a helping behaviour, 
whereas the behaviour of regular participants fits into the definition of volunteerism. Not surprisingly, volunteers typically 
constitute a minority among the participants, and execute the largest part of tasks in the project. Thus, a key feature for 
the success of a human computation for citizen science project is the capacity to foster such kind of sustained contribution 
behaviour. 

Fostering sustained contribution behaviour is an issue that has been widely addressed in human engagement studies. Current 
literature on human engagement focuses on the human behaviour when individuals are self-investing personal resources 
such as time, physical energy, and cognitive power (Bakker and Demerouti, 2008; O’Brien and Toms, 2008; Simpson, 2009; 
Lehmann et al., 2012; McCay-Peet et al., 2012). Studies in this area usually focus on both qualitative and quantitative 
dimensions of engagement by (i) analysing the psychological factors behind engagement/disengagement such as motivation, 
satisfaction, and frustration; and (ii) measuring the level of engagement quantitatively in terms of the degree of contribution 
and the duration of the contribution. 

Several studies have been devoted to the understanding of psychological factors of volunteer engagement in human compu¬ 
tation for citizen science projects (Raddick et al., 2010; Rotman et al., 2012; Jennett et al., 2014; Nov et al., 2014), while 
few studies have focused on quantitatively estimation of the level of engagement of the volunteers (Ponciano et al., 2014b). 
The lack of studies with this perspective is an important constraint because a fundamental requirement for proposing and 
evaluating new engagement strategies is having a clear understanding of how volunteers typically behave in such situations. 
This study aims at filling this gap by providing a quantitative analysis of the nature of engagement of volunteers by using log 
data related to their execution of tasks. Three research questions are addressed in this study: 1) how engaged the volunteers 
are during their interaction with the project; 2) what similarities and differences they exhibit among themselves in terms of 
engagement; and 3) how the engagement characteristics of the volunteers can be exploited for establishing the engagement 
strategies to be implemented in a given project. 

In order to answer these questions, we go through existing human engagement studies and, based on the concepts and 
theories put forward, we propose the following four metrics to measure the level of engagement of each volunteer: activity 
ratio, relative activity duration, daily devoted time, and variation in periodicity. Activity ratio is a measure of the return 
rate of the volunteer to the project during the period that he/she stays contributing to it. Daily devoted time is a measure 
of the length of the daily engagement. Relative activity duration, in turn, is a measure of the duration of the volunteer’s 
long-term engagement. Finally, variation in periodicity informs us about the deviation in the periodicity with which the 
volunteer executes tasks in the project. By using hierarchical and k-means algorithms, we cluster the volunteers according 
to the values of their engagement metrics in order to find out the different engagement profiles that arise from their natural 
behaviour within the project. 

We analyse volunteer engagement profiles according to the data collected from two popular projects hosted at the Zooniverse 
platform: Galaxy Zoo and The Milky Way Project. These projects ran for almost 2 years between 2010 and 2012 and involved 
more than one billion executed tasks and thousands of participants, which turns them into valuable sources for the analysis of 
a wide range of engagement aspects of the volunteers. In both projects, we found 5 different clusters of volunteers based on 
visual inspection and statistical measures. Each cluster stands for a distinct engagement profile brought for by the behaviour 
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shown by the volunteers during their partieipation in the projeets. The distinet engagement profiles brought to light in this 
way are labelled as: hardworking, spasmodie, persistent, lasting, and moderate. 

Hardworking engagement is eharaeterised by larger aetivity ratio, low variation in periodieity and shorter relative aetivity 
duration. Volunteers who exhibit this type of engagement profile fypieally work hard and regularly when arriving af fhe 
projeef, buf may leave fhe projeef quiekly. Spasmodie engagemenf is disfinguished by a relafively high aefivify rafio and 
moderate variafion in periodieify. Volunfeers who exhibif Ibis engagemenf profile provide an infense eonfribufion, af a shorf 
period of lime and wilh irregular periodieify wilhin Ibis period. Persislenl engagemenf, in lurn, is eharaelerised by a larger 
aefivify duralion and low aefivify rafio. Volunfeers who exhibif a persislenl engagemenf profile remain in fhe projeef for a 
long period of lime buf eonlribule only a few days wilhin Ibis lime period. Lasting engagemenf, in lurn, is eharaelerised 
by an engagemenf pallern similar fo persislenl engagemenf, wilh fhe differenee lhal volunfeers exhibif here a mueh shorter 
aefivify duralion. Finally, moderale volunfeers have infermediale seores in all ealegories of engagemenf melries. 

Regarding fhe dislribulion of fhe volunfeers per profile, fhe highesl pereenfage of volunfeers (30% in The Milky Way Projeef 
and 31% in Galaxy Zoo) exhibils a moderale engagemenf profile, while few volunfeers (13% in The Milky Way Projeef 
and 16% in Galaxy Zoo) show persislenl engagemenf. Given fhe lolal amounf of human efforl lime required lo exeeule all 
fhe lasks in fhe projeef, fhe aggregale lime devoled by volunfeers who exhibif a persislenl engagemenf profile aeeounls for 
40% of lolal time in The Milky Way Projeef and 46% in Galaxy Zoo; Ibis is fhe volunleer profile lhal slands for fhe largesl 
eonfribufion. 

The melhod we propose lo measure fhe engagemenf of volunfeers and sef up engagemenf profiles has been shown lo be 
salisfaelory in bringing lo lighl fhe main similarilies and differenees among fhe volunfeers. The fael lhal fhe resulls Ihus 
oblained are eonsislenl fhroughoul differenl projeels slrengfhens fhe Ihesis lhal engagemenf profiles ean arise in various 
olher projeels. Several olher diseussions ean be drawn from our analysis. For example, fhe engagemenf profiles enable 
fhe developmenl of new reeruilmenl slralegies fo allrael volunfeers wilh a desired engagemenf profile as well as fhe design 
of personalised engagemenf slralegies lhal foeuses on improving speeifie engagemenf melries. Finally, our resulls eall for 
furlher Iheorelieal and qualilalive sludies lhal investigate fhe motivation of volunfeers in fhe lighl of fhe dislinel engagemenf 
profiles Ihey may exhibif. The eombinalion of a quanlilalive analysis of volunteer engagemenf and fhe psyehologieal faelors 
eslablished in qualilalive sludies will advanee our eomprehension abouf fhe engagemenf patterns of volunfeers in human 
eompulalion and eilizen seienee. 

In Ihis sludy we pul forward Ihree main eonlribulions. Firsl, we propose four melries lo measure fhe level of engagemenf 
of volunteers wilh regard lo bolh fhe duralion of fhe period of engagemenf wilh fhe projeef and fhe degree of engagemenf 
during Ibis period. Furlhermore, we provide a deeper quanlilalive assessmenf of volunteer engagemenf profiles derived from 
Iwo popular human eompulalion for eilizen seienee projeels. To fhe besl of our knowledge. Ibis is fhe firsl sludy assessing 
nalural engagemenf profiles in volunteer lask exeeulion behaviour in Ihis lype of projeef. Finally, Ihis sludy allows us lo go 
beyond previous sludies by eovering a larger number of volunteers and bringing forlh engagemenf aspeels whieh have so far 
nol been idenlified in sludies foeusing on qualilalive melhodologies. 

The resl of Ibis work is organised as follows. We provide firsl a baekground of human engagemenf sludies and diseuss 
relevanl previous work. Nexl we deseribe our melhod lo measure fhe volunleer engagemenf and idenlify engagemenf profiles. 
Finally, we presenl an analysis of volunleer engagemenf in Galaxy Zoo and The Milky Way Projeef. 

2. BACKGROUND AND RELATED WORK 

This sludy builds on a broad sef of sludies eovering volunleer engagemenf, human eompulalion and eilizen seienee projeels. 
In Ibis seelion, we firsl provide a baekground lo fhe subjeel of human engagemenf. Thereafler, we diseuss fhe relaled work. 

2.1. What is engagement and how to approach it 

The subjeel of human engagemenf has been sludied wilhin a variety of diseiplines, sueh as eduealion (Meeee el ah, 1988), 
managemenl seienee (Simpson, 2009) and eompuler seienee (O’Brien and Toms, 2008). Some sludies make an aflempl lo 
eoneeplualize fhe lerm engagement in an inlerdiseiplinary perspeelive (Gonzalez-Roma el ah, 2006; Bakker and Demerouli, 
2008; O’Brien and Toms, 2008; Simpson, 2009; Lehmann el ah, 2012; MeCay-Peef el ah, 2012). A eonsensus lhal emerges 
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from these studies is that engagement means to participate in any enterprise by self-investing personal resources, such as 
time, physical energy, and cognitive power. 

O’Brien and Toms (2008) provide a conceptual framework to study human engagement with technology. This framework 
establishes that the entire process of engagement is comprised of four stages: point of engagement, period of sustained en¬ 
gagement, disengagement and reengagement. The point of engagement is the time at which the human perform the first action 
in the system. The period of sustained engagement is the continuous period of time in which he/she keeps on performing 
actions in the system. Disengagement occurs when the period of sustained engagement ends. Finally, reengagement denotes 
new engagement cycles composed of point the three first stages. Studies of such process involve at least four dimensions: 
type of engagement, psychological factors of engagement, duration of engagement, and degree of engagement. 

The type of engagement is defined by the kind of personal resources and skills that humans invest in performing an activity. 
Examples of types of engagement are social engagement (Forges, 2003) and cognitive engagement (Corno and Mandinach, 
1983). Social engagement refers to actions that require humans to interact with others. It is widely studied in areas such 
as online social networks and communities (Preece, 2000; Millen and Patterson, 2002). Cognitive engagement refers to 
actions that require mainly human cognitive effort. It has been widely addressed in educational psychology and work 
engagement (Meece et ah, 1988; Simpson, 2009). 

The psychological factors of engagement are related to the motives leading to a point of engagement, disengagement and 
reengagement, such as motivation, satisfaction, perceived control, and frustration. Studies have proposed and/or instantiated 
various theories in order to construct a framework of theories that explain the psychological factors behind human engage¬ 
ment (Gonzalez-Roma et ah, 2006; O’Brien and Toms, 2008). These theories include the self-determination theory (Deci 
and Ryan, 2000) and the self-efficacy theory (Bandura, 1977). The self-determination theory establishes that human moti¬ 
vation can be broadly divided into intrinsic motivations, associated with inner personal reward, and extrinsic motivations, 
associated with earning an external reward or avoiding a punishment. The self-efficacy fheory, in turn, advances the idea that 
perceived human efficacy determines if an individual will initiate an activity, how much effort will be expended, and how 
long the activity will be sustained. 

The duration of engagement measures the duration of the period of sustained engagement, sometimes called retention. It 
expresses how long a human keeps on to the system. It is short-term engagement when it occurs during a relatively short 
period of time (e.g. minutes or hours), and long-term engagement when it lasts a long period of time (e.g. months or 
years). In short-term engagement, the point of engagement is the point in time at which the individual performs the first 
action within the system, the period of engagement is the time span under which he/she keeps interacting with the system 
in a continuous working session, and the point of disengagement is the point in time at which the working session ends. In 
long-term engagement, the point of engagement is the point in time at which the individual performs the first action within 
the system, the period of engagement refers to the number of days under which she/he keeps on interacting with the system, 
and the point of disengagement refers to the day when he/she leaves the system. Thus, long-term engagement may consist 
of several short-term engagement cycles. 

Finally, the degree of engagement is a quantitative measure of the degree of participation during the period of sustained 
engagement. It can also be viewed as a measure of the amount of resources invested by humans in participating in the 
system. Measuring the degree of engagement has proven a challenging task. Some studies use surveys to collect information 
about how humans perceive their level of engagement and hence estimate their degree of engagement (e.g., O’Brien and 
Toms (2010); McCay-Peet et al. (2012)). Other studies use behavioural data stored in logs of the system to measure the 
degree of engagement (e.g. Lehmann et al. (2012)). 

2.2. Related work 

The dimensions of engagement presented in the last section are helpful to framing the previous studies in engagement. There 
is an extensive body of work dealing with engagement in technology-mediated social participation systems (Kraut et ah, 
2010) such as wiki-based systems (Butler et ah, 2002; Bryant et ah, 2005; Butler et ah, 2008; Schroer and Hertel, 2009; 
Preece and Shneiderman, 2009; Niederer and Van Dijck, 2010; Liu and Ram, 2011; Welser et ah, 2011; Zhu et ah, 2012), 
open source software projects (Hertel et ah, 2003; Niederer and Van Dijck, 2010), and human computation for citizen science 
projects (Raddick et ah, 2010; Rotman et ah, 2012; Lopez et ah, 2012; Mao et ah, 2013; Jennett et ah, 2014). 
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Wiki-based systems sueh as Wikipedia provide means that allow partieipants to engage in a broad range of aetivities, such 
as the insertion of a sentence in an article, modification of an existing reference, reverting an article to a former version 
etc (Butler et al., 2008; Liu and Ram, 2011; Welser et al., 2011). Participants assume different roles in the system when 
some of them focus on performing a single type of activity, and others focus on performing other types of activities (Butler 
et al, 2008; Niederer and Van Dijck, 2010; Liu and Ram, 2011). Such roles characterise different types of engagement in the 
system. The motivation of the participants and their perception of their own roles usually change as they become more active 
in the system (Bryant et al., 2005; Burke and Kraut, 2008; Schroer and Hertel, 2009; Preece and Shneiderman, 2009). Since 
such systems provide a collaborative environment, the behaviour of some of the participants may also affect the behaviour 
of others (Butler et al., 2002; Zhu et al., 2012). 

Studies on open source software (OSS) projects, in turn, have focused on understanding the psychological factors that lead 
participants to engage in OSS projects, and the kind of rewards they expect (Hertel et al., 2003; Roberts et al., 2006). For 
example, Hertel et al. (2003) show that psychological factors appeared to be similar to those behind voluntary action within 
social movements such as the civil rights, labour, and peace movements. Studies on Apache projects suggest that there 
are also interrelationships between motivation and degree of engagement (Roberts et al., 2006). Extrinsic motivation, such 
as monetary and status within the system, leads to above average contribution levels, while intrinsic motivations do not 
significantly impact average contribution levels. 

Differently from Wiki-based systems, in which there is a diversity of types of engagement, the role played by volunteers in 
human computation for citizen science projects is mainly the execution of well defined human compufafion fasks, alfhough 
some projecfs allow volunfeers fo carry ouf social engagemenf acfivifies, for insfance inferacfing in forums (Forfson el al., 
2012; Luczak-Roesch el al., 2014). In such projecfs, as in Ihe case of sludies in wiki-based systems and OSS projecfs, Ihe 
psychological faclor is Ihe dimension of engagemenf lhal has received mosl allenlion (Raddick el al., 2010; Rolman el al., 
2012; Jennell el al., 2014; Nov el al., 2014). 

Raddick el al. (2010) analyse Ihe molivalions of volunteers in Ihe Galaxy Zoo project If is shown lhal, among 12 categories 
of molivalions mentioned by Ihe volunteers, Ihe mosl mentioned category is inleresl in aslronomy, which is Ihe Iheme of 
Ihe project Rolman el al. (2012) and Rolman el al. (2014) show lhal Ihe motivation of volunteers changes dynamically 
Ihroughoul Ihe period of Iheir conlribulion lo Ihe projecls. Jennell el al. (2014) analyse factors lhal led volunteers to dabble 
and/or drop-oul in Ihe Old Wealher project The analysis shows lhal Ihis kind of volunteers are less motivated, Ihough Ihey 
care aboul Ihe projecl and Ihe qualify of Ihe work Ihey perform. Thus, projecls should be designed to encourage bolh dabbling 
and commilment Nov el al. (2014) analyses motivation factors lhal affecl Ihe qualify and Ihe quantify of conlribulions to 
citizen science projecls. 

In general, Ihese sludies clarify several aspecls of why volunteer engages in human compulation for citizen science projecls. 
However, lillle progress has been made in terms of underslanding how to measure volunteer engagemenf and to uncover 
nalural patterns in which Ihe engagemenf occurs. This facl conslilules an imporlanl shorlcoming because a key fealure of 
Ihis kind of projecl is ils capacity to engage volunteers. A clear underslanding of how volunteers typically engage wilh such 
kinds of projecls is fundamenlal for proposing and evaluating new slralegies to encourage engagement 

3. FINDING ENGAGEMENT PROFILES 

In Ihis section, we firsl presenl Ihe melrics proposed to measure Ihe degree of engagemenf and Ihe duration of engagemenf 
of volunteers. Then, we presenl a slralegy to cluster volunteer based on Ihe values of Ihese melrics for Ihe volunteers. This 
clustering allows Ihe identification of profiles of volunteers exhibiting similar engagemenf patterns. 

3.1. Measuring engagement 

We characterise volunteers according to how Ihey score in differenl engagemenf melrics. Engagemenf melrics are measures 
of volunteer interaction and involvemenl wilh Ihe project The engagemenf melrics proposed in Ihis section are based on Ihe 
conceplual framework proposed by O’Brien and Toms (2008). By using Ihis framework, we analyse Ihe engagemenf over 
time of volunteers faking into accounl Iheir poinls of engagement periods of suslained engagement disengagemenls and 
reengagemenls. 
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Figure 1 shows the structure of the time line of a volunteer during participation in a project. This figure shows five concepfs 
used in fhe calculafions of our mefrics: fhe time fhe volunfeer could pofenfially remain linked fo fhe projecf, days fhe volunfeer 
remain linked fo fhe projecf, fhe active days, fhe time devofed on an active day, and fhe number of days elapsed befween 
fwo acfive days. Our mefrics are designed fo measure fhe engagemenf of parficipanfs fhaf exhibif an ongoing confribufion 
and have confribufed in af leasf fwo differenf days. By doing so, we focus on parficipanfs fhaf are more likely fo til info fhe 
volunfarism definition (Clary el ah, 1998; Wilson, 2000). 
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Figure 1. Structure of the time line of a volunteer in a project, highlighting the active days and working sessions on the 

active days. 

The time a volunteer i can potentially remain linked to the project is the number of days elapsed between the day in which 
the volunteer joined the project and the day in which the project is concluded. It is denoted by >v, days. An active day of a 
volunteer i is a day on which this volunteer is active in the project. We consider that a volunteer is active on a particular day 
if he/she executes at least one task during that day. We define A,- as the sequence of dates in which the volunteer i is active. 
The time devoted on a specific active day is the sum of the time duration of the contribution sessions of the volunteer on that 
active day. Contribution sessions are continuous short periods of time during which the volunteer keeps executing tasks. We 
define D, as the multiset of the amount of time the volunteer i devotes to the project on each active day. The time elapsed 
between two active days is the number of days it took to the volunteer to return to the project since the latest active day. We 
define Bi as the multiset of the number of days elapsed between every two sequential active days. Considering w,-. A/, D,- and 
Bi, we can derive metrics to measure the degree and the duration of engagement of each volunteer. 

We define two metrics of degree of engagement: activity ratio and daily devoted time. Activity ratio (aO is the proportion 
of days on which the volunteer was active in relation to the total of days he/she remained linked to the project. It can be 
computed as a,- = (^Max{A yMin{A-))+i ’ £ (0) !]■ The closer to I, the more assiduous the volunteer is during the time he/she 

remained linked to the project. Daily devoted time (d,) is the averaged hours the volunteer remain executing tasks on each 
day he/she is active. It can be computed as di = avg{Di), d G (0,24]. The higher the average, the longer the time the 
volunteer devotes to the project executing tasks on the days he/she is active. Note that, because the human computation 
projects usually consist of different time-consuming tasks, the time devoted by the volunteers executing tasks is a better 
measure of their degree of engagement than the number of tasks they execute (Geiger and Halfaker, 2013; Ponciano et ah, 
2014b). 

We also define two metrics to assess the duration of engagement: relative activity duration and variation in periodicity. 
Relative activity duration {ri) is the ratio of days during which a volunteer i remains linked to the project in relation 
to the total number of days elapsed since the volunteer joined the project until the project is over (w,). It is defined as 
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r, = Mm(A,))+\ ^ ^ ^ When r, = 1, the volunteer remains linked to projeet sinee she/he eame to the projeet until 

the projeet is eompleted. The eloser to 1, the more persistent is the partieipation of the volunteer in the projeet. Variation 
in periodicity (v,) is the standard deviation of the times elapsed between eaeh pair of sequential aetive days. It is eomputed 
as Vi = sd{Bi). When v,- = 0, the volunteer exhibits a eonstant elapsed time between eaeh pair of sequential aetive days; this 
indieates that he/she eomes baek to the projeet with perfeet periodieity. On the eontrary, the larger v;, the larger the deviation 
in the periodieity in whieh the volunteer eomes baek to the projeet to perform more tasks. 

The above engagement metries fit well into our objeetive of analysing the degree of engagement and the duration of engage¬ 
ment of the volunteers. Aetivity ratio allows us to analyse the return rate of eaeh volunteer to the projeet during the period 
that he/she stays eontributing. Daily devoted time gives us a view of the length of the daily engagement, whieh is related 
to the duration of the short-term engagement. Relative aetivity duration allows us to analyse the duration of long-term en¬ 
gagement weighted by the duration of the period in whieh the volunteer ean potentially remain linked to the projeet. Finally, 
variation in periodieity informs us about the periodieity of return during the long-term engagement. 

3.2. Clustering volunteers according to engagement metrics 

We use elustering algorithms to find ouf groups of volunfeers who exhibif similar values for fhe engagemenf mefries. The 
inpuf fo elustering algorifhms is a mafrix |/| x 4 in whieh eaeh row sfands for a volunteer i G I and eaeh eolumn is an 
engagemenf mefrie, i.e. a, d, r, and v. As fhe resulfs of elusfering depend on fhe relative values of fhe paramefers being 
elusfered, a normalisation of fhe paramefers prior fo elusfering would be desirable (Jain, 2008). We use range normalisation 
fo seale fhe values of fhe engagemenf mefries in fhe interval [0, ll. The sealing formula is x, = , where x denofes fhe 

^ ^max ^min 

engagemenf mefrie and i fhe volunfeer. 

To idenfify fhe suifable number of elusfers, we firsf run a hierarehieal elusfering algorifhm and observe ifs dendrogram, 
whieh yields a suifable interval fo fesf fhe number of elusfers. Nexf we run k-means, varying fhe number of elusfers (k) in fhe 
suggesfed inferval and using as inifial eenfroids fhe eenfres idenfified in fhe hierarehieal elusfering, whieh usually reduees fhe 
impaef of noise and requires less iteration time (Lu el ah, 2008). We seleef Ihereaffer a suifable k and evaluate fhe qualify of 
fhe elustering by eompufing fhe wifhin-group sum of squares (Anderberg, 1973) and Average Silhoueffe widfh (Rousseeuw, 
1987). 

Wifhin-group sum of squares measures fhe differenees belween fhe volunfeers and fhe eenlre of fhe group fo whieh Ihey 
belong. The lower fhe wifhin-group sum of squares, fhe heller fhe elusfering. If indieates lhal volunfeers elusfered in fhe 
same group exhibif similar values for fhe engagemenf mefries and lhal fhe eenlre of fhe group represenls fhe group adequalely. 
Average Silhoueffe widfh, in furn, measures how well separated and eohesive fhe groups are. This slalislies ranges from — 1, 
indieafing a very poor elusfering, fo 1, indiealing an exeellenf elusfering. Slruyf el al. (1997) propose fhe following subjeefive 
inlerprelafion of fhe silhoueffe slalisfies: belween 0.71 and 1.00, a sfrong slruelure has been found; belween 0.51 and 0.70, 
a reasonable slruelure has been found; belween 0.26 and 0.50, fhe slruelure is weak and eould be arlifieial, and henee if is 
reeommended lhal additional melhods of analysis are fried ouf; less lhan or equal fo 0.25, no subslanlial slruelure has been 
found. In Ihis sludy, a silhoueffe slalislies larger lhan or equal fo 0.51 indieates a reasonable partition of fhe differenl pallerns 
of engagemenf exhibiled by fhe volunteers. 

4. ENGAGEMENT PROFILES IN GALAXY ZOO AND THE MILKY WAY PROJECT 

In Ibis seelion we use fhe proposed melhod fo analyse fhe engagemenf of volunfeers in Iwo projeels: Galaxy Zoo and The 
Milky Way Projeel. We firsf inlroduee Ihese projeels and delail fhe dala sel eolleeled from Ihem. Then, we presenl fhe resulfs 
on fhe qualify of elusfering in Ihese dala sels and fhe diseovered engagemenf profiles. Finally, we diseuss fhe resulfs and 
Iheir implieafions. 

4.1. Datasets 

The dala used in Ibis sludy was eolleeled from Iwo human eompulalion for eilizen seienee projeels: Galaxy Zoo Hubble 
and The Milky Way Projeel. Bolh projeels were developed and deployed in fhe Zooniverse (zooniverse.org) eilizen seienee 
plalform. 
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Table 1. Descriptive statistics of engagement metrics of volunteers in the studied datasets 



The Milky Way Project 

Galaxy Zoo 

#Volunteers 

6,093 

23,547 

Activity ratio 

mean = 0.40, sd = 0.40 

mean = 0.33, sd = 0.38 

Daily devoted time 

mean = 0.44, sd = 0.54 

mean = 0.32, sd = 0.40 

Relative activity duration 

mean = 0.20, sd = 0.30 

mean — 0.23, sd = 0.29 

Variation in periodicity 

mean = 18.27, sd = 43.31 

mean = 25.23, sd = 49.16 


The original Galaxy Zoo (Lintott et ah, 2008) was launched in July 2007, but has been thereafter redesigned and relaunched 
several times. In this project, participants were asked to answer a series of simple questions about the morphology of galaxies. 
Each classifying volunteer on Galaxy Zoo is presented with a galaxy image captured by either the Sloan Digital Sky Survey 
(SDSS) or the Hubble Space Telescope. A decision tree of questions is presented with the answer to each question being 
represented by a fairly simple icon. The task is straightforward and no specialist knowledge is required. In this paper, we 
used data of the third iteration of Galaxy Zoo: Galaxy Zoo Hubble. It was launched in April 2010 and ran until September 
2012. It consisted of 9,667,586 tasks executed by 86,413 participants. In The Milky Way Project (Simpson et ah, 2012), 
participants are asked to draw ellipses onto the image to mark the locations of bubbles. A short online tutorial shows how 
to use the tool, and examples of prominent bubbles are given. As a secondary task, users can also mark rectangular areas 
of interest, which can be labelled as small bubbles, green knots, dark nebulae, star clusters, galaxies, fuzzy red objects or 
“other”. Users can add as many annotations as they wish before submitting the image, at which point they are given another 
image for annotation. We used data of The Milky Way Project launched in December 2010 and ran until September 2012. It 
consisted of 643,468 tasks executed by 23,889 participants. 

Each entry in the data set refers to one task execution. Each task execution is described by project_id, task_id, user_id, 
datetime. The projectjd field is the name of the project. The task_id field is a unique task identifier in the project. The 
user_id field is a unique volunteer identifier in the project. Einally, the datetime field indicates the date and time when the 
task was executed. To form volunteers’ working sessions, we use the threshold-based methodology (Geiger and Halfaker, 
2013; Mehrzadi and Eeitelson, 2012; Ponciano et ah, 2014b). Eollowing this methodology, we compute the interval of time 
elapsed between every two sequential task executions for each volunteer. Given these intervals, we use the method proposed 
by Mehrzadi and Eeitelson (2012) to identify for each volunteer a threshold that distinguishes short intervals from long 
intervals. Hence, whenever the interval between the execution of two tasks is not larger than the threshold, the two tasks are 
assumed to have been executed in the same working session; otherwise, the tasks are assumed to have been executed in two 
different and consecutive working sessions. Eor more details about this methodology, see Mehrzadi and Eeitelson (2012). 

In both projects, participants are considered volunteers only if they have been engaged in at least two days of activity. Only 
volunteers who arrived before the last quarter of the total duration time of the project were considered in the analyses, i.e. the 
first 502 days of The Milky Way Project and the first 630 days of the Galaxy Zoo project. As Table 1 shows, the final datasef 
consisfs of 23,547 volunteers for the Galaxy Zoo and 6,093 volunteers for The Milky Way Project, whereas 2485 volunteers 
contributed to both projects. As shown by the descriptive statistics in this table, in both projects the volunteers differ among 
themselves significantly in terms of all the engagement metrics, all of which are significantly non-normal (Kolmogorov- 
Smirnov normality tests showing p-value < 0.05). The variations in the engagement metrics of the volunteers do not point 
out at any form of anomalous behaviour among the volunteers, which can thus be considered as natural throughout. 


4.2. Clustering 

The result of the quality of the clustering when the number of clusters varies between 2 and 10 is shown in Eigure 2 for The 
Milky Way Project and in Eigure 3 for Galaxy Zoo. These figures show that 5 is the number of groups that best optimise 
the trade-off between the number of groups and the within-group sum of squares (Eig 2(a) and 3(a)). This number of groups 
also yields an Averaged Silhouette statistic of 0.53 in The Milky Way Project (Pig.2(b)) and 0.51 in the Galaxy Zoo project 
(Pig. 3(b)). These values indicate that a reasonable clustering structure has been found for both projects. 
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Number of groups Number of groups 

(a) Within-groups sum of squares (b) Average Silhouette statistic 

Figure 2. Analysis ofk-means clustering in The Milky Way Project. Within-groups sum of squares and average 

Silhouette statistic as the number of groups (k) is varied. 


4.3. Profiles 

In order to understand the different groups uneovered by the elustering algorithm, we analyse: (i) the eentroids that represent 
the groups; (ii) the eorrelation between eaeh pair of volunteer engagement metries for eaeh group; and (hi) how the groups 
differ in terms of the number of volunteers and aggregate eontribution. In this analysis, we established labels to the groups in 
order to put into pespeetive their main engagement eharaeteristies. Thus, the groups represent different engagement profiles 
labelled as follows: hardworking engagement; spasmodie engagement, persistent engagement; lasting engagement; and 
moderate engagement. The general eharaeteristies of these profiles are shown in Figure 4, Table 2 and Table 3. 

Figure 4 shows fhe eenfroids fhaf represenf eaeh profile and how fhey differ in terms of engagement metries. In eaeh image, 
the horizontal axis stands for the engagement profiles, eaeh bar represenfing one engagemenf mefrie, and fhe verfieal axis 
indieafes how fhe profiles seore in fhe parfieular engagemenf mefries. Table 2, in furn, shows how fhe profiles differ in ferms 
of eorrelafion befween fheir engagemenf mefries. Finally, Table 3 shows how fhe profiles differ in terms of fhe number of 
volunfeers and how fheir aggregafe eonfribufions differ in ferms of fofal working fime devoted fo fhe projeef. In fhe following 
paragraphs, we elaborate on fhese resulfs by analysing eaeh engagemenf profile in furn. 

Hardworking engagemenf. Volunfeers who exhibif a hardworking engagemenf profile have larger aefivify rafio and shorter 
relative aefivify durafion eompared fo ofhers profiles (Fig 4). Sueh mefries indieafe fhaf volunfeers in fhis profile work hard 
when fhey eome info fhe projeef, buf may leave fhe projeef soon. This engagemenf profile also exhibifs low variafion in 
periodieify. This means fhaf volunfeers who exhibif fhis engagemenf profile refurn fo fhe projeef fo perform more fasks 
in nearly equal intervals of time, which makes fhe lime of refurn of fhese volunfeers fairly prediclable. Olher inlrinsic 
fealure of fhis group of volunfeers is a very slrong negalive correlation befween aefivify ratio and variation in periodieify 
(p(a,v) = —0.99, in bolh projecfs). This correlation indicates fhaf fhe more days fhe volunfeers refurn fo fhe projeef fo 
perform fasks, fhe less variable are fhe time inlervals befween fheir active days. 

Spasmodic engagemenf. This engagemenf profile is distinguished by a relalively high aefivify rafio and low aefivify duration 
(Fig 4). This group of volunteers exhibifs a positive eorrelafion befween relalive aefivify durafion and variafion in periodicity. 
This correlation is moderate (p(r,v) = 0.59) in fhe Milky Way Projeef and slrong (p(r,v) = 0.66) in fhe Galaxy Zoo projeef 
(Table 2). These correlalions indicate fhaf fhe longer fhe period of time fhe volunfeers remain linked fo fhe projeef, fhe more 
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Number of groups Number of groups 

(a) Within-groups sum of squares (b) Average Silhouette statistic 

Figure 3. Analysis of k-means clustering in the Galaxy Zoo project. Within-groups sum of squares and average 

Silhouette statistic as the number of groups (k) is varied. 


erratic is the periodicity of their return to the project within this period. All these characteristics indicate that contributions 
of volunteers exhibiting this profile typically takes place during a short period of time and with irregular periodicity within 
this period. 

Persistent engagement. Persistent engagement is characterised by the largest relative activity duration, the highest variation 
in period, and a short activity ratio (Fig 4). Thus, volunteers with a persistent engagement profile remain linked fo fhe 
projecf for a long interval of lime bul are aclive only a few days wilhin Ibis inlerval. Considering fhese engagemenf mefrics, 
persislenf engagemenf may be seen as fhe opposile of hardworking engagemenf. In bofh projecfs, a small percenfage of all 
fhe volunteers fall in Ibis engagemenf profile: 13.41% in The Milky Way Projecf and 16.07% in fhe Galaxy Zoo projecf. 
Togelher, fhese volunleer slands for fhe largesl percenfage of fhe lolal working lime devoled lo each projecf, 39.91% in The 
Milky Way Projecf and 46.16% in fhe Galaxy Zoo projecf (Table 3). If is fhe mosl imporlanl profile in terms of devoted 
working time. 

Lasting engagement. This is the engagement profile of volunleers exhibiling comparalively high relalive aclivily duration 
and variafion in periodicily (Fig 4). This kind of volunleers show an aclivily ratio similar lo lhal exhibited by fhe volunteers 
who slay longer in fhe projecf (persislenf engagemenf) bul remain in fhe projecf during a shorter period of lime. Finally, Ihis 
is fhe only engagemenf profile showing very weak or weak correlalion befween all pairs of mefrics in bofh projecfs (Table 2). 

Moderate engagement. As shown in Figure 4, this engagement profile has no particularly distinguishable engagement 
metrics. Compared to the other profiles, moderate volunteers exhibit intermediate values in all engagement metrics. One 
important characteristic of moderate engagement is a strong negative correlation between activity ratio and relative activity 
duration. This correlation is p{a,r) = —0.74 in The Milky Way Project and p(a,r) = —0.76 in Galaxy Zoo (Table 2). These 
values indicate that the degree of volunteer engagement in this profile falls with increased engagement duration. Hence, the 
more days the volunteers return to the project to perform tasks, the shorter is the total period of time that they remain linked 
to the project. This engagement profile is exhibited by most volunteers in both studied projects: nearly 30% of the volunteers 
in The Milky Way Project and 31% in Galaxy Zoo fall into this engagement profile (Table 3). 
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Figure 4. Score of each engagement profile in each engagement metric. Engagement profiles are represented by the 
centroids of groups of volunteers identified by the k-means algorithm in (a) The Milky Way Project and (b) Galaxy Zoo 

project. 


4.4. Discussion 

Our results show that volunteers in the studied projects share several similarities and differences in terms of engagement. 
The identified profiles of engagemenf puf info perspective such similarities and differences. Furfhermore, fhey help us fo 
heifer undersland how Ihe differenf engagemenf palferns resull in differenf levels of aggregated confribufion fo Ihe projecls. 
Several pracfical and research discussions can be done from fhis analysis. We focus on four of fhem, which are: profile- 
orienfed volunteers’ recruifmenf, personalised engagemenf slralegies, psychological facfors behind Ihe engagemenf profiles, 
and external validify of Ihe resulfs. 

Profile-oriented volunteers’ recruitment. It is natural that scientists running citizen science projects that require human 
computation want to devote more effort in recruiting volunteers who exhibits a desired engagement profile. It is still the most 
important aspect when they want to optimise the tradeoff between the costs of recruiting volunteers and the benefit of having 
all tasks of the project performed as soon as possible (Ponciano et ah, 2014a). Studies have been devoted to understanding 
how different disclosure campaigns (e.g. traditional media and online media (Robson et ah, 2013)) differ in terms of the type 
of volunteers they attract. In a similar direction, it is also important to know how different disclosure campaigns differ in 
terms of the engagement profile of the volunteers they attract. For example, could a disclosure campaign based on sending 
e-mails to people interested in the theme of the project (e.g., astronomy, biology) attract more persistent volunteers than 
advertising campaigns in traditional media? Other important aspects that can be taken into account in optimising volunteer 
recruitment is human homophily (McPherson et ah, 2001), which is the principle that humans tend to be similar to their 
friends in several aspects. Perhaps taking homophily into account one could motivate volunteers with a desired engagement 
profile to recruit volunteers among his/her relatives, friends, and colleagues with a similar profile? Hence, new and more 
effective recruitment procedures might be brought forth with an increased knowledge on volunteer engagement profiles. 

Personalised engagement strategies. Besides recruiting more suitable volunteers, it is also important to keep existing vol¬ 
unteers engaged. The impact of management practices on volunteer engagement is a widely discussed issue in volunteerism 
literature (Clary et ah, 1992; Cravens, 2000). Such practices are implemented by volunteer supervisors in a way that takes 
into account the specific behaviour of each volunteer, aiming thereby at enriching the volunteer experience and satisfying 
organizational needs. By showing that volunteers in human computation for citizen science projects behave very differently 
from each other, this study encourages the development of a component to manage the engagement of volunteers in such 
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Table 2, Spearman p correlation between each pair of engagement metrics of volunteers within each engagement profile 


The Milky Way Project 


Pair 

Hardworking 

1,535 

Spasmodic 

A^= 1,060 

Persistent 

A^ = 817 

Lasting 

A? = 844 

Moderate 

A^= 1,837 

p{a,r) 

-0.24* 

-0.38* 

-0.14* 

-0.26* 

-0.74* 

p(a,v) 

-0.99* 

-0.22* 

0.06 

0.39* 

-0.13* 

p(a,d) 

-0.07* 

-0.05 

0.43* 

0.37* 

0.14* 

P(r,v) 

0.24* 

0.59* 

-0.13* 

-0.04 

0.44* 

p{r,d) 

0.14* 

0.23* 

-0.09* 

0.02 

0.01 

p{v,d) 

0.07* 

0.29* 

0.19* 

0.31* 

0.21* 


Galaxy Zoo 


Pair 

Hardworking 

A^ = 4,572 

Spasmodic 

A^ = 3,611 

Persistent 

N = 3,783 

Lasting 

A^ = 4,250 

Moderate 

A^ = 7,331 

p{a,r) 

-0.30* 

-0.45* 

0.15* 

-0.23* 

-0.76* 

p{a,v) 

-0.99* 

-0.31* 

-0.26 

0.27* 

-0.12* 

p{a,d) 

-0.10* 

0.03 

0.33* 

0.30* 

0.19* 

Pir,v) 

0.30* 

0.66* 

-0.12* 

0.00 

0.43* 

p{r,d) 

0.07* 

0.17* 

0.08* 

0.02 

-0.05* 

p{v,d) 

0.10* 

0.26* 

-0.01 

0.16* 

0.16* 


Note 1: *Spearman’ p significant coefficient of correlation (p-value < 0.05). 
Note 2: Moderate and strong correlations are highlighted in boldface. 


Table 3. Profiles importance in terms of the number of volunteers and their devoted time 


Profiles 

The Milky Way Project 

Galaxy Zoo 

#Volunteers 

Devoted time 

#Volunteers 

Devoted time 

Hardworking 

Spasmodic 

Persistent 

Lasting 

Moderate 

1,535 (25.19%) 
1,060(17.40%) 
817(13.41%) 
844(13.85%) 

1,837 (30.15%) 

2,030.26 (13.86%) 
1,912.05 (13.05%) 
5,846.58 (39.91%) 
2,273.10(15.52%) 
2,588.28 (17.67%) 

4,572(19.42%) 
3,611 (15.34%) 
3,783 (16.07%) 
4,250(18.05%) 

7,331 (31.13%) 

4,857.49 (9.44%) 
6,061.40(11.78%) 
23,757.64 (46.16%) 
8,168.95 (15.87%) 
8,621.64(16.75%) 

sum 

6,093 (100%) 

14,650.27 (100%) 

23,547 (100%) 

51,467.12 (100%) 


Note: The highest number of volunteers and the longest devoted time for each project are highlighted in boldface. 


projects. This component would incorporate personalised engagement strategies (Fischer, 2001; Lopez et ah, 2012; Mao 
et ah, 2013) derived from the volunteer engagement profiles uncovered in the present work. The component could also 
both monitor the contribution behaviour of each volunteer and, when necessary, automatically trigger a suitable engagement 
strategy. Prospective volunteers with different behaviour profiles should be approached wifh differenl engagemenf sfrafegies, 
which could focus on e.g. encouraging a reducfion or an improvemenf of fheir engagemenf. 

Sfrafegies can focus on encouraging a reducfion of volunfeer engagemenf when, for example, some volunteers sfarf fo 
compromise loo much of fheir lime fo Ihe projecl, which could perhaps have a negative impacl on Ihe resl of his/her social 
life, in Ihe worsl case leading fo a slale of burnoul (Gonzalez-Roma el ah, 2006; Simpson, 2009). Forlunalely, Ibis is nol 
Ihe fypical silualion in Ihe Iwo projecfs we have sludied; even volunteers wifh a hardworking engagemenf profile devote 
lypically less lhan 21 minutes per day fo Ihe projecl, which is nol alarming. If is imporlanl lhal Ibis kind of behavior can 
be monilored, and, if necessary, sfrafegies are puf in place fo deal wifh Ihe polenlial harm lhal Ibis can bring fo volunleers. 
When volunteers exhibil a suilable engagemenf profile, if is very imporlanl fo recognize fheir conlribulions in order fo keep 
Ihem engaged (Wilson, 2000; Rolman el ah, 2012). Sfrafegies can also focus on encouraging Ihe improvemenf of volunfeer 
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engagement when volunteers exhibit a level of engagement below projeet average. This oeeurred frequently in the projeets 
we have studied. Eaeh volunteer engagement profile shows a lower level of engagement than the moderate engagement 
profile in af leasf one engagemenf mefrie. 

There is a large body of work on sfrafegies for eneouraging eonfribufion fo online projeefs. Many of fhose sfrafegies are 
diseussed by Krauf ef al. (2012). Example of sfrafegies are (i) sending a message fo fhe volunfeers asking fhem for more 
eonfribufion; or (ii) providing volunfeers online in fhe projeef wifh speeifie and highly ehallenging goals, e.g. exeeufing 
a number n of fasks before logoff. One non frivial question fhaf musf be answered before puffing a sfrafegy fo work is 
whieh engagemenf mefries one wishes fo improve. Diseovering fhe engagemenf profiles of fhe volunfeers enables finding 
ouf in whieh engagemenf mefrie eaeh profile falls shorf, and fo deeide whieh sfrafegy fo develop foeusing on eaeh volunfeer 
profile. The eorrelafions befween fhe engagemenf mefries in eaeh engagemenf profile fell us how ofher engagemenf mefries 
are affeefed when sfrafegies are puf info praefiee fo improve one speeifie engagemenf mefrie. They also allow one fo assess, 
for example, fhe additional gains fhaf eould be obfained from fhe mulfiplieafive effeefs resulfing from relafionships befween 
various mefries. 

Psychological factors behind the engagement profiles. As we diseussed early, some studies have sought to understand 
the motivation of volunteers to partieipate in human eomputation for eitizen seienee projeets (Raddiek et ah, 2010; Rotman 
et ah, 2012; Jennett et ah, 2014). Our results open a new perspeetive for sueh studies. Given that we have shown that 
volunteers exhibit different engagement profiles, new studies on the motivation faetors ean be eondueted eonsidering the 
engagement peeuliarities of eaeh profile. One major question to be answered in sueh studies is whieh motivations may lay 
behind eaeh engagement profile. This ealls for a more theoretieal perspeetive, for example: (i) eonsidering self-determination 
theory (Deei and Ryan, 2000), are persistent volunteers more extrinsieally motivated than the volunteers who exhibit other 
engagement profiles? or (ii) eonsidering self-effieaey theory (Bandura, 1977), why do hardworking volunteers expend mueh 
effort in the short term, but do not sustain their engagement in the long term. Besides eomplementing our understanding of 
volunteer engagement, sueh studies may provide information about volunteer motivation and experienee in the projeets. 

In the profiles’ analysis, we observe an opposition between degree of engagement and duration of engagement. Sueh op¬ 
position is elear in two main points: 1) very strong negative eorrelation between aetivity ratio and aetivity duration in the 
moderate engagement profile; 2) the opposition between the charaeteristies of hardworking engagement and persistent en¬ 
gagement. The negative eorrelation between aetivity ratio and aetivity duration in the moderate engagement profile indieates 
that partieipating in the projeet with a high frequeney rate and remaining a long time in the projeet are eontradietory eharae- 
teristies. It ean also be observed in the opposition between hardworking volunteers and persistent volunteers. Hardworking 
volunteers show a higher degree of engagement, but with a shorter duration. Persistent volunteers, on the eontrary, show a 
lower degree of engagement but during a longer time period. It is important to understand the faetors behind this opposition 
and to ask if there are situations in whieh the volunteers would present both a high degree and a long duration of engagement. 

External validity. Here we diseuss about the generality of our study eonsidering two main aspeets: (i) whether the method¬ 
ology we have proposed to measure the engagement of volunteers and identify their engagement profiles ean be applied in 
other projeets; and (ii) whether the results obtained in the ease study with data eolleeted from Galaxy Zoo and The Milky 
Way Projeet ean be generalised to other human eomputation for eitizen seienee projeets. 

The methodology we have proposed is based on theoretieal frameworks that support the study of voluntarism (Clary et ah, 
1998; Wilson, 2000) and human engagement (Bandura, 1977; O’Brien and Toms, 2008). We draw on sueh frameworks to 
derive metries for measuring the engagement of volunteers and to uneover engagement profiles from grouping them. In 
the ease study eondueted with data eolleeted from Galaxy Zoo and The Milky Way Projeet, this methodology shown to be 
satisfaetory in uneovering groups of volunteers that bring to light the main similarities and differenees among them. Thus, 
studies seeking sueh quantitative analysis of the engagement ean take advantage of this methodology. 

Regarding the generality of the engagement profiles, there are two aspeets that reinforee the idea that these types of profiles 
are more generie and thus ean arise also in other types of projeets. Eirst, the same set of profiles have arisen in projeets 
signifieantly different in terms of the tasks and the number of volunteers involved. Tasks in Galaxy Zoo are less time 
eonsuming than tasks in The Milky Way Projeet (Poneiano et ah, 2014b). Galaxy Zoo has almost four times more volunteers 
than The Milky Way Projeet (Table 1), eonsidering as volunteers those partieipants who have been aetive in at least two 
different days. As most of our results and eonelusions are equivalent in both projeets, the differenees in the design of 
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the tasks and in the number of volunteers have been shown not to affeet the engagement profiles. Seeond, some profiles 
deseribe behaviours that are common in Web systems. For example, the observed fact that a small group of volunteers 
(persistent engagement) are responsible for the largest amount of contribution to the project has been shown to be valid also 
elsewhere (Hargittai and Walejko, 2008; van Mierlo, 2014). 

5. CONCLUSIONS AND FUTURE WORK 

In this study we answer three research questions: 1) how we can measure the level of engagement of volunteers during their 
interaction with a citizen science project that uses human computation; 2) which different patterns of volunteer engagement 
behaviour can be identified and specified as fypical volunfeer profiles; and 3) how fhe idenlified volunteer engagemenf 
profiles can be exploited for designing sfrafegies for increasing fhe engagemenf of volunfeers in a projecf. We go fhrough 
exisfing human engagemenf sfudies and, based on fhe concepfs and fheories puf forward, we propose quanfifafive engagemenf 
mefrics fo measure differenf aspecfs of volunfeer engagemenf, and use dafa mining algorifhms fo idenfify fhe differenf 
volunfeer profiles in terms of fhe engagemenf mefrics. We use fhis mefhod fo analyse fhe engagemenf of volunfeers in fwo 
projecfs: Galaxy Zoo and The Milky Way Projecf. 

Our resulfs show fhaf volunfeers in fhe sfudied projecfs share several similarifies and differences in terms of engagemenf. 
We idenfify five disfincf engagemenf profiles fhaf pul info perspeclive such similarifies and differences. They are labelled as 
follows: hardworking, spasmodic, persisfenf, lasfing, and moderate. These profiles differ among fhemselves according fo a 
sef of mefrics fhaf we have defined for measuring fhe degree and duralion of volunteer engagemenf. Regarding fhe dislribulion 
of fhe volunfeers along fhe profiles, fhe highesf percenfage of volunfeers falls info fhe moderafe engagemenf profile, while 
only a few volunfeers exhibif a persisfenf engagemenf profile. On fhe ofher hand, persisfenf volunfeers accounl for fhe highesf 
percenfage of fhe lolal human efforl dedicaled fo execule all fhe fasks in fhe projecf. Several discussions are drawn from 
our analysis, such as profile-orienfed volunteers’ recruifmenl, personalised engagemenf sfrafegies, and psychological facfors 
behind fhe engagemenf profiles. 

Our analysis of volunfeer engagemenf, based on log dafa, yielded a powerful framework for idenlifying fhe relevanf patterns 
of volunfeer engagemenf in human compufafion for cifizen science projecfs. However, fhe currenf framework sfill presenfs 
some shorfcomings fhaf will be addressed in fulure work. We have focused on cognifive engagemenf of volunfeers executing 
human compufafion fasks, buf if is known fhaf volunfeers also confribufe by creating additional confenf such as posfs in 
projecf forums, which can be regarded as a form of social engagemenf. Assessing fhe behaviour of volunfeers wifh regard 
fo fhis fype of engagemenf is also imporfanf. Finally, fulure work may be dedicated fo analysing volunfeer engagemenf in 
fhe conlexl of ofher cifizen science projecfs fhaf use human compufafion. This analysis may give an answer fo fhe question 
whefher fhe sef of engagemenf profiles we have idenlified on fhe basis of fhe fwo described projecfs is generic enough fo be 
applied fo fhe use of human compufafion for citizen science projecfs in general. Thus, we hope fhis sludy molivales furlher 
research on volunteer engagemenf in fhis type of projecfs. 
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